Chinese Word Similarity Measurement

نویسندگان

  • Yunfang Wu
  • Wei Li
چکیده

Word similarity computation is a fundamental task for natural language processing. We organize a semantic campaign of Chinese word similarity measurement at NLPCC-ICCPOL 2016. This task provides a benchmark dataset of Chinese word similarity (PKU-500 dataset), including 500 word pairs with their similarity scores. There are 21 teams submitting 24 systems in this campaign. In this paper, we describe clearly the data preparation and word similarity annotation, make an in-depth analysis on the evaluation results and give a brief introduction to participating systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet

Semantic similarity measurement of multilingual words is a challenging problem in data mining, information extraction, information retrieval, etc. This paper introduces an algorithm to measure the semantic similarity of Chinese-English bilingual words based on Chinese WordNet, an expansion of WordNet in Simplified Chinese. The algorithm not only measures the semantic similarity for Chinese and ...

متن کامل

Improve Chinese Word Embeddings by Exploiting Internal Structure

Recently, researchers have demonstrated that both Chinese word and its component characters provide rich semantic information when learning Chinese word embeddings. However, they ignored the semantic similarity across component characters in a word. In this paper, we learn the semantic contribution of characters to a word by exploiting the similarity between a word and its component characters ...

متن کامل

Word Semantic Similarity Calculation Based on Domain Knowledge and HowNet

Word semantic similarity is the foundation of semantic processing, and is a key issue in many applications. This paper argues that word semantic similarity should associate with domain knowledge, which traditional methods did not take into account. In order to adopt domain knowledge into semantic similarity measurement, this paper proposed a sensitive words sets approach. For this purpose, we a...

متن کامل

Chinese Entity Relation Extraction Based on Word Co-occurrence

Chinese entity relation extraction is a part of entity relation extraction. According to entity relation extraction technology and the features of Chinese news corpus, this paper proposes a novel method for Chinese entities relation extraction. The method, named WCORE (word co-occurrence relation extraction), first measures the semantic similarity by word co-occurrence and then adopts pattern m...

متن کامل

The Research of Chinese Semantic Similarity Calculation Introduced Punctuations

So far, most Chinese natural language processing neglects the punctuations or oversimplifies their functions. To improve the efficiency of Chinese similarity computing, this paper gives a Chinese similarity computing system model in accordance with the problems of Chinese sentence similarity computation aspect. This model is a combination of punctuations and traditional similarity computing. Co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016